rna structure
BeeRNA: tertiary structure-based RNA inverse folding using Artificial Bee Colony
Mlaweh, Mehyar, Cazenave, Tristan, Alaya, Ines
The Ribonucleic Acid (RNA) inverse folding problem, designing nucleotide sequences that fold into specific tertiary structures, is a fundamental computational biology problem with important applications in synthetic biology and bioengineering. The design of complex three-dimensional RNA architectures remains computationally demanding and mostly unresolved, as most existing approaches focus on secondary structures. In order to address tertiary RNA inverse folding, we present BeeRNA, a bio-inspired method that employs the Artificial Bee Colony (ABC) optimization algorithm. Our approach combines base-pair distance filtering with RMSD-based structural assessment using RhoFold for structure prediction, resulting in a two-stage fitness evaluation strategy. To guarantee biologically plausible sequences with balanced GC content, the algorithm takes thermodynamic constraints and adaptive mutation rates into consideration. In this work, we focus primarily on short and medium-length RNAs ($<$ 100 nucleotides), a biologically significant regime that includes microRNAs (miRNAs), aptamers, and ribozymes, where BeeRNA achieves high structural fidelity with practical CPU runtimes. The lightweight, training-free implementation will be publicly released for reproducibility, offering a promising bio-inspired approach for RNA design in therapeutics and biotechnology.
- Europe > Austria > Vienna (0.05)
- North America > United States > Michigan (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
A Comprehensive Benchmark for RNA 3D Structure-Function Modeling
Wyss, Luis, Mallet, Vincent, Karroucha, Wissam, Borgwardt, Karsten, Oliver, Carlos
The RNA structure-function relationship has recently garnered significant attention within the deep learning community, promising to grow in importance as nucleic acid structure models advance. However, the absence of standardized and accessible benchmarks for deep learning on RNA 3D structures has impeded the development of models for RNA functional characteristics. In this work, we introduce a set of seven benchmarking datasets for RNA structure-function prediction, designed to address this gap. Our library builds on the established Python library rnaglib, and offers easy data distribution and encoding, splitters and evaluation methods, providing a convenient all-in-one framework for comparing models. Datasets are implemented in a fully modular and reproducible manner, facilitating for community contributions and customization. Finally, we provide initial baseline results for all tasks using a graph neural network. Source code: https://github.com/cgoliver/rnaglib Documentation: https://rnaglib.org
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Europe > Austria > Vienna (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- (2 more...)
RiboGen: RNA Sequence and Structure Co-Generation with Equivariant MultiFlow
Rubin, Dana, Costa, Allan dos Santos, Ponnapati, Manvitha, Jacobson, Joseph
Ribonucleic acid (RNA) plays fundamental roles in biological systems, from carrying genetic information to performing enzymatic function. Understanding and designing RNA can enable novel therapeutic application and biotechnological innovation. To enhance RNA design, in this paper we introduce RiboGen, the first deep learning model to simultaneously generate RNA sequence and all-atom 3D structure. RiboGen leverages the standard Flow Matching with Discrete Flow Matching in a multimodal data representation. RiboGen is based on Euclidean Equivariant neural networks for efficiently processing and learning three-dimensional geometry. Our experiments show that RiboGen can efficiently generate chemically plausible and self-consistent RNA samples. Our results suggest that co-generation of sequence and structure is a competitive approach for modeling RNA.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > Texas > Travis County > Austin (0.04)
Accurate RNA 3D structure prediction using a language model-based deep learning approach
Shen, Tao, Hu, Zhihang, Sun, Siqi, Liu, Di, Wong, Felix, Wang, Jiuming, Chen, Jiayang, Wang, Yixuan, Hong, Liang, Xiao, Jin, Zheng, Liangzhen, Krishnamoorthi, Tejas, King, Irwin, Wang, Sheng, Yin, Peng, Collins, James J., Li, Yu
Accurate prediction of RNA three-dimensional (3D) structure remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to scarcity of experimentally determined data, complicates computational prediction efforts. Here, we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pre-trained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate RhoFold+'s superiority over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and inter-helical angles, providing empirically verifiable features that broaden its applicability to RNA structure and function studies.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hong Kong (0.05)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design
Anand, Rishabh, Joshi, Chaitanya K., Morehead, Alex, Jamasb, Arian R., Harris, Charles, Mathis, Simon V., Didi, Kieran, Hooi, Bryan, Liò, Pietro
We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Missouri (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
3D-based RNA function prediction tools in rnaglib
Oliver, Carlos, Mallet, Vincent, Waldispühl, Jérôme
Understanding the connection between complex structural features of RNA and biological function is a fundamental challenge in evolutionary studies and in RNA design. However, building datasets of RNA 3D structures and making appropriate modeling choices remains time-consuming and lacks standardization. In this chapter, we describe the use of rnaglib, to train supervised and unsupervised machine learning-based function prediction models on datasets of RNA 3D structures.
#ICML2023 invited talk: Jennifer Doudna on machine learning for biological research
The programme of the International Conference on Machine Learning (ICML) featured an invited talk by Jennifer Doudna entitled "The future of ML in biology: CRISPR for health and climate". Jennifer Doudna and Emmanuelle Charpentier won the 2020 Nobel Prize in Chemistry for "the development of a method for genome editing". The method in question is often referred to as CRISPR/Cas9 genetic scissors. Using this technique, researchers can change the DNA of animals, plants and microorganisms with extremely high precision. This technology has already had a huge impact on the biological sciences.
Physics-aware Graph Neural Network for Accurate RNA 3D Structure Prediction
Zhang, Shuo, Liu, Yang, Xie, Lei
Biological functions of RNAs are determined by their three-dimensional (3D) structures. Thus, given the limited number of experimentally determined RNA structures, the prediction of RNA structures will facilitate elucidating RNA functions and RNA-targeted drug discovery, but remains a challenging task. In this work, we propose a Graph Neural Network (GNN)-based scoring function trained only with the atomic types and coordinates on limited solved RNA 3D structures for distinguishing accurate structural models. The proposed Physics-aware Multiplex Graph Neural Network (PaxNet) separately models the local and non-local interactions inspired by molecular mechanics. Furthermore, PaxNet contains an attention-based fusion module that learns the individual contribution of each interaction type for the final prediction. We rigorously evaluate the performance of PaxNet on two benchmarks and compare it with several state-of-the-art baselines. The results show that PaxNet significantly outperforms all the baselines overall, and demonstrate the potential of PaxNet for improving the 3D structure modeling of RNA and other macromolecules. Our code is available at https://github.com/zetayue/Physics-aware-Multiplex-GNN.
Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design
Tan, Cheng, Zhang, Yijie, Gao, Zhangyang, Cao, Hanqun, Li, Stan Z.
While artificial intelligence has made remarkable strides in revealing the relationship between biological macromolecules' primary sequence and tertiary structure, designing RNA sequences based on specified tertiary structures remains challenging. Though existing approaches in protein design have thoroughly explored structure-to-sequence dependencies in proteins, RNA design still confronts difficulties due to structural complexity and data scarcity. Adding to the problem, direct transplantation of protein design methodologies into RNA design fails to achieve satisfactory outcomes although sharing similar structural components. In this study, we aim to systematically construct a data-driven RNA design pipeline. We crafted a large, well-curated benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure. More importantly, we proposed a hierarchical data-efficient representation learning framework that learns structural representations through contrastive learning at both cluster-level and sample-level to fully leverage the limited data. By constraining data representations within a limited hyperspherical space, the intrinsic relationships between data points could be explicitly imposed. Moreover, we incorporated extracted secondary structures with base pairs as prior knowledge to facilitate the RNA design process. Extensive experiments demonstrate the effectiveness of our proposed method, providing a reliable baseline for future RNA design tasks. The source code and benchmark dataset will be released publicly.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Bay Area biotech uses AI to unlock RNA structures -- and find new therapies in the process
From Covid-19 vaccines to therapies against a range of deadly diseases, every RNA molecule has a complex 3D shape that controls its function. A new Bay Area biotech is harnessing artificial intelligence to better understand and predict these structures in hopes of developing new therapeutics. The South San Francisco company, Atomic AI, was started in May 2021 and is still young. So is its CEO, Raphael Townshend, a Stanford computer science Ph.D. who just turned 30. But the startup already has big ambitions for how its AI-based approach can both help biotechs identify RNA-targeting drugs and design RNA to be stable, compact, or to have other desired properties.